-
Notifications
You must be signed in to change notification settings - Fork 6
add NCCL support to add_ghost_cells and operators in /basicoperators #137
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
@tharittk, very good start! I am going to reply to some of your comments/questions and will look more closely at your code in the next few days.
|
|
I see, for the third point about passing If somehow the |
|
The most recent commits reflect some changes I want to point out
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tharittk very good!
Everything makes sense to me and I agree with the code changes. I just left some minor suggestions. After you have taken those into account, I think we can merge this PR 😄
One more minor thing, as you progress don't forget to keep the table in the gpu.rst up to date.
| local_shapes = None | ||
| global_shape = getattr(self, "dims") | ||
| arr = DistributedArray(global_shape=global_shape, | ||
| base_comm_nccl=x.base_comm_nccl, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we are changing this, I think it would be safe to also pass base_comm_nccl=x.base_comm.. I think in the past this never led to any issue as we probably always used MPI.COMM_WORLD but it's good not to assume this to be always the case 😄 (@rohanbabbar04, agree?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| base_comm_nccl=x.base_comm_nccl, | |
| base_comm=x.base_comm, | |
| base_comm_nccl=x.base_comm_nccl, |
| __all__ = [ | ||
| "initialize_nccl_comm", | ||
| "nccl_split", | ||
| "nccl_allgather", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it may be good to add all nccl_* methods to the Utils section of https://github.com/PyLops/pylops-mpi/blob/main/docs/source/api/index.rst
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alright, I can do that. Maybe in the other PR ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good. If it is very small like this we can go for the same PR, if it is something a bit more consistent like the changes you made previously, it is good practice to have a separate documentation-only PR😄
|
Since the PR involves adding NCCL support to the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tharittk I have reviewed the new additions and they look great to me!
There are still a few conversations to resolve (for one of them we can hopefully get @rohanbabbar04's opinion), and I would like to hear from @rohanbabbar04 if he has any general comment, after that I will merge.
|
Sorry, I missed your messages. I’ll review the PR in a day or two and share my comments. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In all cases, I would use base_comm = x.base_comm along with base_comm_nccl = x.base_comm_nccl everywhere, since we handle both cases. This should not cause any issues, as base_comm changes to MPI.COMM_WORLD when base_comm_nccl is not None.
| local_shapes = None | ||
| global_shape = getattr(self, "dims") | ||
| arr = DistributedArray(global_shape=global_shape, | ||
| base_comm_nccl=x.base_comm_nccl, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| base_comm_nccl=x.base_comm_nccl, | |
| base_comm=x.base_comm, | |
| base_comm_nccl=x.base_comm_nccl, |
|
Thanks for the review @rohanbabbar04 @mrava87 ! |
|
Great, I am going to merge this. @tharittk great work 😄 |
I added the support for NCCL to the MPIVStack. Along the way, I discover some interesting issue and it may be worth discussing so I initiate this drafted PR
Change Made
DistributedArray.py
@reshapewhich callsadd_ghost_cells()in this file.add_ghost_cells()has to be modified to support NCCL. There two points I want to point out.self._allgather(cell_fronts): thiscell_frontsare the metadata and is small in size (list of ints, size = total ranks). Under the current implementation, I will dispatch to call NCCL if NCCL is enabled. Should we enforce it to always use MPI instead ?self.base_comm.send: this sends the ghost cells to peer and the sent array can be large. Under the current code, it will always use MPI. If we want to have it use NCCL, we will need to implement point-to-point NCCL. Luckily, CuPy NCCL supports this. I think there will be just adding more call in_nccl.pyVStack.py
base_comm_nccl- just like what we did inDistrbiutedArray. I did not change theMPILinearOperatorinterface to take this argument though. But I don't have a strong opinion on either case.yis fromOp @ xandOp.H @ xnow initialized to have the same base_comm_nccl asxi.e., ifxlives in NCCL,yshould lives and communicate with NCCL too.test_stack_nccl.py
HStackoperator